Deep Learning - Project 1 Report - Convolutional Neural Networks

Kacper Trębacz 145453

Jan Gruszczyński 145464

Part 1:

Task 1:

As it was our first approach with image classification, we took up the suggestion and decided to use Caltech 101 dataset.

To download the dataset, one can use commands written below.

We haved removed class BACKGROUND_GOOGlE manually, upon inspection, the category appears to consist of random images downloaded from google without any patterns or distinct futures, also the class contains a lot of images simillar to other classes.

Loading the data:

Step verification:

Task 2:

Step verification

Task 3:

Step verification:

Proportions are correct.

Task 4:

Step verification:

Accuracy is usually between 3-5%

Task 5

Step verification:

Our neural network almost always has problem with 1 or 2 classes. Sometimes its ewer, somentimes buddha or both. This time it appears that laptops have beated it. At this stage of the experiment, we suspect it might due to the structure of the neural network. But what's interesting, it doesn't matter on how many classes we train the network if the number is bigger than around 25 classes.

Task 6:

Step verification:

Last validation accuracy is the same. Confusion matrices look the same.

Task 7:

  1. The class BACKGROUND google was removed. We made the convolution block into a separate custom block, to make the tests in part 2 easier.
  2. We have achieved around 70% accuracy on average on validation set, so results aren't that bad, but still there is a room for improvement. The confusion matrix also seems to be pretty balanced.
  3. The model clearly overfits. To prevent that we could for example reduce the number of parameters (with for example increase in the number of convolutional blocks, but this approach didn't work as we have show below in part 2). One could also use weight initialization. (The validation accuracies often dependent much on luck). We could also tinker we dropout (part 2). One could also use reqularization.
  4. That's the most interesting thing. The model on average cannot recognize certain one class, this time it struggled with class laptop as we have written above. But the unrecognizable class seems to be picked by random. (Of course, by random we mean the order of data provided in batches to the network, when the model is fitted.) Classes ibis, budha, bonsai are picked the most for the ones, that cannot be recognized by the network. When they are recognized, ofthen they are confused with each other.
  5. Change in structure of the network seems to be the most promising.

Part 2:

To compare the networks, we created this scary looking function. With it, one can test every adjastubel hyper parameter of our model. (Number of filters, filter sizes, activation functions, dropouts, pool_sizes, number of blocks, dense layer structure).

It is worth noting, that we performed the test below more than one time to check if the result wasn't caused by lucky starting data selection.

Task 1 Different number of classes:

The model worked best with 2-class classification problem, which is not that surprising. What is surprising is that the 3rd worst performance was achievied when classyfing just 5 classes. After further inspection, we think it was caused by inclusion of sometimes difficult to recognize for the network class - the watch, although next models didn't have a problem with it. What is more, validation accuracy increases, but then suddenly drops with each consecutive epoch. It seems that the model started to extremely overfit, similarly as in 3 class case, but data for first epochs was chosen more luckily in that model. Other learning curves seem normal and not that interesting.

As it was expected, the more classes included, the longer the time to train the network on average, and the lower validation accuracy.

Our own additional Task - different structures of dense layers

To tinker a little bit with the structure of the network, we decided to test a little bit more the structure of dense layers. How much number of neurons in that layers matter? Can there be too litle of them?. The test showed, that 500,500 structure seemed to be the best, but we choose [100,256], because it has much less parameters and it traning time was two times faster. Much simpler, but almost as good.

To be sure we searched once again in closer neighbourhood of best result:

It looks like the best sizes for dense layers are [128,200]. Which is a bit confusing, intuition suggest that the structure were the number of neurons in each layer is decreasing (not increasing) would work better.

On the last day, we figured that we might also check different number of layers. This test was performed after all other tests.

Here we can see that one dense layer seems to work a bit better than 2 layer structure. We performed this test a couple of times more, to test if it wasn't by a coincidence. It seems it wasn't. One layer seems to be better for this specific problem taking into account our structure of the model. Also, this version of the dense layer structure less times struggles with problem described above (not being able to recognize one class at all). Further more, confussion matrices look more smoother.

Below, we wanted to test also, what would happen if we added more layers?

Tenedency can be observed, the more layers in the dense structure, the worse performance of the model is. We suspect that this is the case, because of the small amount of classes in the dataset (Caltech 101 is a small datasets, and we prune it further to 26 classes).

Task 3 Dropout test

Here we try to find the best dropout value.

Results are very confusing. Apparently best droupout seems to be no dropout at all.. However 0.2 seems to be pretty close so we will examine few more examples near 0 and 0.2. On average, when test previously, dropout around this rate seemed the best. But this result is very interesting, as every network was trained on the same data. Also the first model, as said before for some reason wasn't able to recognize one class, this time it was scorpion.

Around 0 and 0.2 are still the best. Despite the fact that 0 is slightly better than 0.2 We decided to go with 0.2. Maybe it might help when testing more layers. We see that training time is a little bit higher, but learning curve is more smooth with higher dropout.

Task 6 Comparison of the models for different preprocessing approaches

This was a bit difficult to implement, but most certainly interesting. In the future, we could add here aditional preprocessing such as gaussian blur or such.

As we expected standardizing images gives the best result. It improved network results by ~7%. What is surprising is that we achieved almost as good results by simply substructing the mean (After repetition of the test, it achieved just a bit lower val accuracy than standarization. As we can see the network performs significantly worse on raw data, so it is really beneficial to perform this simple operation before running images through network. But the difference is not that huge. And as always, each model strugels with certain class, this time it was again scorpion, then brain.

Task 7 Different activation functions

*for convolutional layers

We can clearly ssee that relu function is the best for Conv layers. What is surprising is that its training time was almost the highest. However we can see that it made more epochs and last 15 of them were close to optimum. Which means that it is more stable. Gelu was almost as good as relu, but it was less stable in the end. Sigmoid function also did pretty well. But after talk with Doctor Krzysztof Martyn, we decided to use relu.

Task 9 Different number of convolutional blocks

It seems that initial requirement of the project was correct. After multitude of test - 3 blocks always proved to be the best. However 4 blocks are just slightly worse and have smoother learning curve. We could not reach as high score as in activation function testing. Probably because in previous one, we were lucky to have a starting model that allowed us to reach so high accuracy. We deduct that data and first initialization overwall plays the biggest role in traning this kind of neural network.

After the tests above, we think we can achieve an accuracy of around 80% which seems good, not great. Still there is a room for improvment and tests of other hyper parameters. Anyhow, the experiment taught as lot, even though Kacper already had some experience with neural networks.

Thank you for reading